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Abstract 

In this paper, we consider testing the correlation coefficient matrix between two 
subsets of high-dimensional variables. We produce a test statistic by using the ex¬ 
tended cross-data-matrix (ECDM) methodology and show the unbiasedness of ECDM 
estimator. We also show that the ECDM estimator has the consistency property and 
the asymptotic normality in high-dimensional settings. We propose a test procedure 
by the ECDM estimator and evaluate its asymptotic size and power theoretically and 
numerically. We give several applications of the ECDM estimator. Einally, we demon¬ 
strate how the test procedure performs in actual data analyses by using a microarray 
data set. 

Keywords: Correlations test; Cross-data-matrix methodology; Graphical modeling; 
Large p, small n; Pathway analysis; RV-coefhcient 


1 Introduction 

Suppose we take samples, Xj, j = l,...,n, of size n (> 4), which are independent and 
identically distributed (i.i.d.) as a p-variate distribution. Here, we consider situations 
where the data dimension p is very high compared to the sample size n. Let xj = 
{x'fj,X 2 j)^ and assume Xij G R^', i = 1,2, with pi G [l,p— 1] and p 2 = p—pi- We assume 
that Xj has an unknown mean vector, ^ unknown covariance matrix. 



that is, E{xij) = p^, Yai{xij) = Sj, i = 1,2, and Cov(a;ij,* 2 ^) = E{xijx'^-) — PiP^ = 
S*. Let (Tjj be the j-th diagonal element of for i = 1,2; j = l,...,pi, and assume 
aij > 0 for all i,j. We denote the correlation coefficient matrix between xij and X 2 j 
by Corr(a;ij,a; 2 j) = P, where P = diag(crii,...,diag(cJ 2 i,...,< 72 ^ 2 )“^/^. Here, 
diag((Tii, denotes the diagonal matrix of elements, an, ...,aip^. 

In this paper, we consider testing the correlation coefficient matrix between xij and 
X 2 j by 

Hq-.P = Oys.Hi-.P^O (1) 
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Figure 1: Illustration of the test by ([I]). On can apply the test to constructing gene 
networks. 

for high-dimensional settings. When {pi,P 2 ) = (p — 1,1) or (l,p — 1), ([T]) implies the test 
of correlation coefficients. Aoshima and Yata [1] gave a test statistic for the test of correla¬ 
tion coefficients and Yata and Aoshima [19] improved the test statistic by using a method 
called the extended cross-data-matrix (ECDM) methodology. The test of correlation co¬ 
efficient matrix is a very important tool of pathway analysis or graphical modeling for 
high-dimensional data. One of the applications is to construct gene networks. See Figure 
1. Drton and Perlman [5] and Wille et al. [16] considered pathway analysis or graphical 
modeling of microarray data by testing an individnal correlation coefficient. For example, 
Wille et al. |16j analyzed gene networks of microarray data with p = 834 {pi = 39 and 
P 2 = 795) and n = 118. On the other hand, Hero and Rajaratnam [8] considered correla¬ 
tion screening procedures for high-dimensional data by using a test of correlations. Lan et 
al. [9] and Zhong and Chen [20] considered tests of regression coefficient vectors in linear 
regression models. As for the test of independence, see Fnjikoshi et al. [7], Srivastava and 
Reid [13] and Yang and Pan [T7]. Also, one may refer to Szekely and Rizzo [IT] [15] about 
distance correlation. 

In Section 2, we give several assumptions to construct a high-dimensional correlation 
test for ([1]). In Section 3, we produce a test statistic for ([I]) by using the ECDM method¬ 
ology and show the unbiasedness of ECDM estimator. We also show that the ECDM 
estimator has the consistency property and the asymptotic normality when p —oo and 
n —>■ oo. In Section 4, we propose a test procedure for © by the ECDM estimator and 
evaluate its asymptotic size and power when p ^ oo and n —>■ oo theoretically and nu¬ 
merically. In Section 5, we give several applications of the ECDM estimator. Finally, we 
demonstrate how the test procedure performs in actual data analyses by using a microarray 
data set. 

2 Assumptions 

In this section, we give several assnmptions to construct a test procedure for ([T]). We have 
the eigenvalue decomposition of S by 51 = HAH^, where A =diag(Ai,..., Ap) having 
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eigenvalues, Ai > • • • > Ap > 0, and H is an orthogonal matrix of the corresponding 
eigenvectors. Let Xj = HA.^^'^Zj + /.i, j = where E{zj) = 0 and Var( 2 :j) = Ip. 

Here, Ip denotes the identity matrix of dimension p. Note that if Xj is Gaussian, the 
elements of Zj are i.i.d. as the standard normal distribution, iV(0,1). We assume the 
following model: 

+ /X, j = l,...,n, (2) 

where F is apx g matrix for some (? > 0 such that FF^ = S, and Wj = (tcij,..., Wqj)'^, j = 
are i.i.d. random vectors having E{wj) = 0 and Var(mj) = Iq. Let F = 
(Ff,F^)'^, where Fj = {'Jn, ■■■,'Yiq) with G for i = 1,2. Then, we have that 
Xij = FiWj + for i = 1,2. Note that X)* = FiF^ = J2r=i 7ir72r- Also, note that 
m includes the case that F = and wj = zj. Let Ya,r{w7) = Mr, r = l,...,q. 

We assume that limsupp^ooM^ < oo for all r. Similar to Bai and Saranadasa [3] and 
Aoshima and Yata [2], we assume that 

(A-i) E{wl^wlj) = E{wl^)E{wlj) = 1 and E{wrjWsjWtjWuj) = 0 
for all r ^ s,t, u. 


We assume the following assumption instead of (A-i) as necessary: 


(A-ii) • • • <:,) = 

for all ri / r 2 7 ^ • • ■ 7 ^ € [l,g] and a* G [1,4], i = l,...,u, where v < 8 and 

ELi < 8. 


See Chen and Qin [3] and Zhong and Chen [20] about (A-ii). Note that (A-ii) implies (A- 
i). When Xj is Gaussian, it holds that F = and wj = zj in ([2|). Note that (A-ii) 

is naturally satisfied when Xj is Gaussian because the elements of zj are independent and 
Mr = 2 for all r. We assume the following assumption for SjS as necessary: 

( tr(S'^) '1 

(A-iii) min \ -2— > —)> 0 as u ^ oo. 

^ ^ i=i,2ltr(5]2)2/ 

We note that if p* —>■ oo and tr(5]i^)/tr(X)?)^ —>• 0 as p —>■ oo, (A-iii) holds even when 
Pi' is fixed for i' 7 ^ i. Also, note that “tr(5]^)/tr(5]f)^ —>■ 0 as p —?> 00 ” is equivalent to 

“A„,ax(Si)/tr(S2)V2 

—7> 0 as p ^ 00 ”. Here, Amax(Si) denotes the largest eigenvalue of 
Sj. Let m = min{p,n} and A = tr(5]*5]^) (= ||S*|||.), where || ■ Hf is the Frobenius 
norm. We note that A = 0 is equivalent to P = O. We assume one of the following two 
assumptions as necessary: 


(A-iv) 


tr(S2)tr(s2) 


n^A^ 
(A-v) lim sup < ■ 


0 as m —>■ 00 ; 


n 


2a2 


< 00 . 


.tr(5]f)tr(5]^). 

Note that (A-v) holds under the null hypothesis Rq in (P). 


3 





3 ECDM methodology 


Yata and Aoshima |19] developed the ECDM methodology that is an extension of the 
CDM methodology given by Yata and Aoshima [18] . One of the advantages of the ECDM 
methodology is to produce an unbiased estimator having small asymptotic variance at a 
low computational cost. See Section 2.5 of Yata and Aoshima m for the details. In this 
section, we give a test statistic for ([T|) by the ECDM methodology. 


3.1 Unbiased estimator by ECDM 

We consider an unbiased estimator of A by the ECDM methodology. Let = [n/2] 
and n( 2 ) = n — n(i), where [x] denotes the smallest integer > x. Let 


U„(i)(fc) 

^n(2)(fc) 


{[k/2\ -n(i) + 1,..., [k/2\} 

{ 1 ,..., [k/2\} U{lk/2\ +n(2) + l,...,n} 

{[A:/2j + 1,..., [k/2\ + n(2)} 

{1,..., [k/2\ - n(i)} U {[/c/2j +l,...,n} 


if [k/2\ > n(i), 
otherwise; 

if [k/2\ < n(i), 
otherwise 


for k = 3 ,..., 2n —1, where [xj denotes the largest integer < x. Let denote the number 
of elements in a set A. Note that i^Vn(i)(k) = ^ = 1)2, Vn{i)(k) C Vn{ 2 )(k) = 0 a-iid 

^n{i){k) U yn{ 2 ){k) = {1) •■•) for k = 3 ,..., 2n - 1. Also, note that 


i^Vn{i){i+j) and j e Vn{ 2 )ii+j) for i < j (< n). 


(3) 


Let 

^i(i){k) ^(1) ^ y and ®;(2)(fc) ^(2) ^ ^ ^ 1 )^ 

n{l)(k) j^y n{2)(k) 

for k = 3 ,..., 2n — 1. Let 

Ajj {xii {xij ®l(2)(i+j))(®2i ®2(l)(i+j)) (®2jr ®2(2)(i+j)) 

for all i < j (< n). Then, from ([3|), we emphasize the following facts: 


(i) Xu — and xij — £Ci( 2 )(i+j) are independent; 

(ii) X 2 i - ® 2 (i)(i+i) and X 2 j - ® 2 ( 2 )(i+j) are independent; 

(hi) E{Aij) = A{(n(i) - l)(n(2) - l)}/(n(i)n(2)) 

for all i < j {< n). Let Un = n(i)n( 2 )/{(n(i) — l)(n( 2 ) — 1)}. We propose an unbiased 
estimator of A by 


‘2^'Un y 

n(n — 1) 
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Remark 1. One can save the computational cost of T„ by using previously calculated 
and ® 2 (i)(A:)) k = 3, ...,2n — 1; i = 1,2. Then, the computational cost of is of 
the order, 0{n‘^p). 


If one considers a naive estimator of A as tr(S'*S'J) having S^, = 

X 2 nY'/{n — 1) with Xin = n~^ * = 1)2, it follows that under (A-i) 


E{iT{S,Sl)} = A + 0 


tr(Si)tr(5]2) 


n 


Note that the bias term of tr(S^*S^^) becomes very large as p increases. Srivastava and 
Reid [13] considered an estimator of A by 


^SR = 


(n — 1)^ 


(n — 2)(n + 1) 


tiiS.Si)- 


.rp tr(5i)tr(S'2)\ 


n — 1 




with SiS the sample covariance matrices when the underlying distribution is Gaussian. 
They showed that E{Asr) = A. However, Arr is very biased without the Gaussian 
assumption. Gontrary to that, the proposed estimator, Tn, is always unbiased and one 
can claim that E{Tn) = A without any assumptions. 


Remark 2. We give the following Mathematica algorithm to calculate T„: 

Input: Sample size n and nxpi data matrices X[i ], i = 1, 2, such as X[i ] = {xn, ..., XinY' . 
Mathematica code: 

• nl =Ceiling[n/2]; n2 = n — nl; u = 2 * nl * n2/{{nl — 1) * (n2 — 1) * n * (n — 1)) 

• V[l, fc_, A_] :=If [Floor[fc/2] > nl, Take[A, {Floor[A;/2] — nl + 1, Floor[fc/2]}], 
Join[Take[A, {1, Floor[fc/2]}], Take[A, {Floor[A;/2] + n2 + 1, n}] ] ] 

• V[2, fc_, A_] :=If [Floor[fc/2] < nl, Take[A, {Floor[A:/2] + 1, Floor[fc/2] + n2}], 
Join[Take[A, {1, Floor[fc/2] — nl}, Take[A, {Floor[fc/2] + l,n}] ] ] 

• Do[M[z,j,fc] =Meaii[V[j,k,X[i] ], {k,3,2*n- 1}, {i,l,2}, {j, 1,2}] 

• T = u=i=Sum[(Part[A[l], ij—M[l, 1, i + j]).(Part[A[l],}]—M[l, 2, i + }]) 
*(Part[A[2],i]-M[2,l,i + }]).(Part[A[2],j]-M[2,2,i + j]), {j,2,n}, {i,l,j-l}] 

Then, one obtains T = Tn- 


3.2 Asymptotic properties of T„ 

We first consider the consistency property of Tn in the sense that Tn/A = 1 + op(l) as 
m —^ oo. 


Lemma 3.1. Assume (A-i). It holds that as m ^ oo 


Var{fn) = {4 


tr(SiS,S25]:) + tr{(5],5]^)2} + - 2)(7f..S,72 


n 


+ 2 
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Remark 3. When the underlying distribution is Gaussian and Xl* = O, Srivastava and 
Reid [13] showed that as m ^ oo 


Var(A5ij) 




under certain regularity condition which is stronger than (A-iii). However, as for T„, one 
can claim that Var(T„) in Lemma 3.1 is asymptotically equivalent to Var(A 5 j:j) under 
(A-iii) and S* = O. 


Note that = 2 for all r when the underlying distribution is Gaussian. From Lemma 
3.1, we have the consistency property of Tn as follows: 

Theorem 3.1. Assume (A-i) and (A-iv). Then, it holds that as m —>■ oo 


A 


1 -I- Op(l). 


The consistency property holds under (A-iv). When (A-iv) is not met, we consider the 
asymptotic normality of Tn- Let 6 = {2tr(Xl^)tr(S2)}^'^^/''^- We give the following result. 

Lemma 3.2. Assume (A-i), (A-iii) and (A-v). Then, it holds that as m ^ oo 


Var{fn) 

52 


1 + 0 ( 1 ). 


From Lemma 3.2 we have the asymptotic normality of Tn as follows: 
Theorem 3.2. Assume (A-ii), (A-iii) and (A-v). Then, it holds that as m —>■ 00 


T — A T — A 

7 ^= = + op{l) => N{0, 1), 

/ Var{Tn) 


where denotes the convergence in distribution and N{0, 1) denotes a random variable 
distributed as the standard normal distribution. 


3.3 Estimation of tr(S^) 

Since tr(S?)s are unknown in 6, it is necessary to estimate tr(S?)s for constructing a test 
for ([T|). Yata and Aoshima m gave an estimator of tr(5]f), i = 1, 2, by 


Win = 


2u ^ 


Note that E{Win) = tr(S?). From Lemma 3.1, we have the following result. 
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Lemma 3.3. Assume (A-i). Then, it holds as m ^ oo that for i = 1,2 


Var( 




4 


. ntr{'S. 


2\2 


q 


i=i 


Remark 4. In Section 2.5 of Yata and Aoshima m. they compared Win with other 
estimators of tr(5]|) theoretically and computationally. They showed that Win has small 
asymptotic variance at a low computational cost. 


Let 6 = {2WinW2n)^^^In. Then, by combining Theorem 3.2 with Lemma 3.3, we have 
the following result. 

Corollary 3.1. Assume (A-ii), (A-iii) and (A-v). It holds that as m —>■ oo 

T — A 

^ ^ Y(0,1). 

<5 

Now, we considered an easy example such as pi = p 2 , /r = 0, Hi = H 2 = 

and T = Let Sj = HiAiHf for i = 1, 2 , where Aj = diag(Aii,..., Xip^) 

with eigenvalues, A^i > • • • > Ajp. (> 0), and Hi is an orthogonal matrix of the corre¬ 
sponding eigenvectors. We considered two cases: (a) A = 0 (iCij = HiAy‘^{wij, 
and X 2 j = H 2 Al^‘^{wp^+ij, ...,Wpj)'^), and (b) A = A 13 A 23 (®ij = HiAy^{wij,Wpj^j)'^ 
and X 2 j = H 2 Al^'^{wpj^+ij,Wp^+ 2 j,W 3 j,Wp+ij, ...,WpjY'). Here, Xj, j = 1, ...,n, were gen¬ 
erated independently from a pseudorandom normal distribution with mean vector zero 
and covariance matrix S for each case of (p, n) = (10, 25), (200, 50) and (4000,150). Note 
that (A-ii), (A-iii) and (A-v) hold from the fact that A = 0(1). In Figure 2, we gave 
two histograms of 2000 independent outcomes of Tn/5 for (a) and (b) in each case of 
(p, n) together with probability densities of N(0, 1) and N{A/6, 1). Prom Corollary 3.1, 
we expected that Tn/S is close to Y(0,1) when A = 0 and N{A/d,l) when A / 0. 
When {p,n) = (10,25), the histograms appear far from the probability densities. When 
(p, n) = (200,50), the histogram for (a) fits well the probability density of N{0, 1). How¬ 
ever, the histogram for (b) is still far from the probability density of A(A/(5,1). This is 
because the convergence in Lemma 3.2 is slow for A 7 ^ 0 compared to A = 0. As expected, 
both the histograms fit well the probability densities when (p, n) = (4000,150). For other 
simulation settings such as pi = p — 1 and p 2 = 1, see Section 2 of Yata and Aoshima [T9] . 


4 Test of high-dimensional correlations 

In this section, we propose a test procedure for ([1]) in high-dimensional settings. 
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(p, n) = (10, 25) (p, n) = (200, 50) (p, n) = (4000,150) 


Figure 2: The solid lines are probability densities of A^(0,1) and Ai(A/(^, 1). The his¬ 
tograms of Tn/5 for cases of (a) A = 0 and (b) A 7 ^ 0 fit the solid lines with increasing 
dimension and sample size: (p, n) = (10,25), (200,50) and (4000,150). 


4.1 Test procedure for ([T]) 

Let a G (0, 1 / 2 ) be a prespecified constant. From Corollary 3.1, we test ([T]) by 

Tn 

rejecting Hq 4=^ -^ > Za, (4) 

5 

where Za is a constant such that P{N(0,1) > Za} = ct- Then, we have the following 
result. 


Theorem 4.1. Under (A-ii) and (A-iii), the test by has that as m ^ 00 

size = a + o(l) and power{A-i,) — = o(l)) 

where <b(-) denotes the c.d.f. of N{0,1) and power/A*/ denotes the power when A = A* 
for given A*(> 0). 


When (A-iv) is met, we have the following result from Theorem 3.1. 

Corollary 4.1. Assume (A-i). Assume (A-iv) under Hi. Then, the test by & has for 
any A(> 0) that as m ^ 00 

Power{A) = 1 -|- o(l). 


Remark 5. Let 


K = U 


tr(I]iS*5]2Sf) + tr{(S*s3^)2} + 


n 


+ 2 


tr(Ef)tr(I]i) + A^1/2 
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Then, from Lemma 3.1, it holds that Yav{Tn)/K'^ ^ 1 as m —)• 00 under (A-i) and (A-iii). 
Hence, from Theorem 3.2, one may write the power in Theorem 4.1 as 

power{A.,) “ = o(l)- 









































































4.2 Simulation 

In order to study the performance of the test by (HD, we used computer simulations. We set 
a = 0.05, Pi =p 2 , n = 0, = S(0.3l*-^I'^")B, Sa = S(0.4l*-^I^^")B and T = 

where 

B = diag[{0.5 + l/(pi + ..., {0.5 + pi/(pi + 1)}^/^]. 

Note that tr(Si) = pi {i = 1, 2). We set (a) A = 0 and (b) A = A 13 A 23 that are the same 
settings as in Figure 2. We considered three distributions for XjS\ (I) Ap(0, X)), (II) Wrj = 
{vrj — l)/ 2 ^/^ (r = in which VrjS are i.i.d. as the chi-squared distribution with 1 

degree of freedom and (III) WjS are i.i.d. as p-variate t-distribution, tp{v)^ with mean zero, 
covariance matrix Ip and degrees of freedom v = 10. Note that (A-ii) is met in (I) and (II). 
However, (A-i) (or (A-ii)) is not met in (III). We set p = 2^ (s = 4,..., 11) and n = ^\pi ]. 
We note that (A-iii) and (A-v) hold for (a) and (b). We compared the performance of 
Tn with Asr/Ssr by Srivastava and Reid [13], where 6sr = {2VFi(5i?)IF2(SR)}^'^^/^ 
Wi(^SR) = [(™ “ l)^/{(^ — 2)(n-|- l)}]{tr(S^) — tr(S'j)^/(n — 1)}, i = 1, 2. They showed that 
Asr/Ssr has the asymptotic normality as m —>■ 00 when the underlying distribution is 
Gaussian and A = 0. Also, note that E{Asr) = A only under the Gaussian assumption. 
Gontrary to that, from Gorollary 3.1, T^/S has the asymptotic normality as m —>■ 00 even 
for non-Gaussian situations and A ^ 0. Also, one can claim that E{Tn) = A without any 
assumptions such as (A-i). 

In Figure 3, we summarized the findings obtained by averaging the outcomes from 4000 
(= R, say) replications for (I) to (III). Here, the first 2000 replications were generated for 
(a) when A = 0 and the last 2000 replications were generated for (b) when A 7 ^ 0. We 
defined = 1 (or 0 ) when Hq was falsely rejected (or not) for r = 1 ,..., 2000 , and Hi was 
falsely rejected (or not) for r = 2001, ...,4000. We gave a = {R/2)~^ to estimate 

the size in the left panels and 1 — /3 = 1 — {R/2)~^ Y1i^=r/2+i to estimate the power in 
the right panels. Their standard deviations are less than 0.011. Let L = ^(A/K — ZaS/K). 
From Theorem 4.1 in view of Remark 5, we expected that a and 1 — /3 for (HD are close 
to 0.05 and L, respectively. In Figure 4, we gave the averages (in the left panels) and the 
sample variances (in the right panels) of r,i/A and Arr/A by the outcomes for (b) when 
A 74 0 in cases of (I) to (HI). From Remark 5, the asymptotic variance for T^jA was given 
by !A^. 

From Figures 3 and 4, we observed that Arr gives good performances for the Gaus¬ 
sian case. However, for non-Gaussian cases such as (H) and (HI), Arr seems not to give 
a preferable performance. Especially, it gave quite bad performances for (HI). That is 
probably because (A-i) (or (A-ii)) is not met in (HI). On the other hand, gave ade¬ 
quate performances for high-dimensional cases even in the non-Gaussian situations. We 
observed that is quite robust against other non-Gaussian situations as well. Hence, we 
recommend to use for the test of ([ID and for the estimation of A. 
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Size Power 



Figure 3: The values of a are denoted by the dashed lines in the left panels and the values 
of 1—/3 are denoted by the dashed lines in the right panels for the tests by (jH) and Asr/Ssr 
(SR) in cases of (I) to (III). The asymptotic powers were given by L = $(A/iF — ZaSfK) 
which was denoted by the solid lines in the right panels. 
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(II) The chi-squared distribution with 1 degree of freedom. 


Average Variance 
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Figure 4: The averages of r„/A and Asr/^ are denoted by the dashed lines in the left 
panels and their sample variances, V(Tn/A) and V{Asr/A), are denoted by the dashed 
lines in the right panels in cases of (I) to (III). The asymptotic variance of T^/A was given 
by iF^/A^ which was denoted by the solid lines in the right panels. 
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5 Applications 

In this section, we give several applications of the results in Section 3. 


5.1 Confidence interval for A 

We construct a confidence interval for A by 

I = [max{f„ - Zo/2?, 0}, fn + Za/2\ 
where a G (0,1). Then, from Corollary 3.1, it holds that as m —>■ oo 

F(A € I) = 1 — a + o(l) 

under (A-ii), (A-iii) and (A-v). Hence, one can estimate A by I. If one considers Sq as 
a candidate of S*, one can check whether Sq is a valid candidate or not according as 
ll^lolll' E / or not. 

5.2 Checking whether (A-iv) holds or not 

As discussed in Section 3, Tn holds the consistency property when (A-iv) is met, and Tn 
holds the asymptotic normality when (A-v) is met. Here, we propose a method to check 
whether (A-iv) holds or not. 

Let K = VLi„H 2 „/(rer„)^. We have the following result. 

Proposition 5.1. Assume (A-i). It holds that as m ^ oo. 

K = op(l) under (A-iv)-, = Op(l) under (A-v). 

From Proposition 5.1, one can distinguish (A-iv) and (A-v). If/? is sufficiently small, 
one may claim (A-iv), otherwise (A-v). 


5.3 Estimation of the RV-coefficient 

Let p = A/{tr(5]^)tr(5]|)}^/^. Here, p is the (population) RV-coefficient which is a multi¬ 
variate generalization of the squared Pearson correlation coefficient. Note that p E [0,1]. 
See Robert and Escoufier m for the details. Smilde et al. [I2] considered the RV- 
coefficient for high-dimensional data. 

Let p = f„/(Wi„W2n)^/2. Then, we have the following result. 

Proposition 5.2. Assume (A-i). It holds that as m ^ oo 


p = p + Op{l/n -I- 




fr(Si5]*S25]^) \ 1/2 

tr{T(i)tr{T.l)n J 


I = Op(n ^/^). 


Thus, one can estimate the RV coefficient by p for high-dimensional data. 
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5.4 Test of high-dimensional covariance structures 

We consider testing 


Hq : S* = 'Sq vs. Hi : X)* 7^ Sq; 


( 5 ) 


where Sq is a candidate covariance structure. Let Aq = ||5]* — So|||’ and 

^ijfl Un^ij ~ ) So(a;2i “ ®2(i)(i+j))/(^(i) ~ 1) 

-n(2){xij -Xi(^2)(i+j)V'^o{x2j - ®2{2)(i+i))/(f^(2) “ 1 ), 

where Un = n(i)n( 2 )/{(n(i) - l)(n( 2 ) - 1)}. Note that E{Aij^o) = ||S*||p - 2tr(5]^So) = 
Aq — ||So|||’- Then, we consider a test statistic for ([5]) by 


Tn,o = 


n{n — 1) 


^ij,o + ll^c 


|2 

If- 


i<j 


Note that E{Tnfl) = Aq. Let S*o = “ ^o- Then, we have the following result. 

Lemma 5.1. Assume (A-i). Then, it holds that as m —oo 


Var{fn) = {- 


^ tr(I]iI],oE2Sro) + - 2)(7f,E.o72,)^ 

n 


+ 2L5!)L^li±A}{i + „(!)} + o({M51M5|)}A). 

From Lemma l5.11 Theorems 3.1 and 3.2, we have the following results. 

Corollary 5.1. Assume (A-i). Assume also (A-iv) with A = Aq. Then, it holds that as 
m —>■ oo 


Tn,0 

^0 


1 + op(l)- 


Corollary 5.2. Assume (A-ii), (A-iii) and (A-v). Assume also (A-v) with A = Aq. 
Then, it holds that as m —>■ oo 


Tnfl — Aq 


A(0,1). 


Hence, one can apply to a test for ([5]). 
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Figure 5: Illustration of the isoprenoid gene network given by Figure 2 in Wille et al. |16j 
and the additional genes, where DXPSl, PPDSl and so on are names of genes. DPPS2 
is connected with both MEP pathway and MVA pathway. Other genes of mitochondrion 
are not connected with either MEP pathway or MVA pathway. 

6 Example 

In this section, we demonstrate how the proposed test procedures perform in actual data 
analyses by using a microarray data set. We analyzed gene expression data of Arabidopsis 
thaliana given by Wille et al. [16] in which the data set consists of 118 samples having 
834 (= p) genes: 39 (= pi) isoprenoid genes and 795 (= P 2 ) additional genes. All the data 
were logarithmic transformed. Wille et al. m considered a genetic network between the 
two gene sets. By using a graphical Gaussian modeling, they constructed an isoprenoid 
gene network given in Eigure 2 of [16]. In Eigure 5, we gave the illustration of the isoprenoid 
gene network and the additional genes. We first considered testing ([T]) by using dU). See 
Eigure 1 for the illustration. Let a = 0.05. We calculated = 352.5 and d = 7.296, so 
that Tnjb = 48.3. Erom (jl]) and = 1.645, we rejected Hq. Thus we concluded that two 
networks have some connections. In addition, we calculated 'k = 0.000214. Thus, with the 
help of Proposition 5.1 one may conclude that (A-iv) is met, so that the power of the test 
is 1 asymptotically and TnjIS. = l + op(l) from Theorem 3.1 and Corollary 4.1. Also, with 
the help of Proposition 5.2 we obtained p = 0.579 as an estimate of the RV-coeflicient. 

Next, we considered testing ([T]) between some part of the isoprenoid genes and the 
additional genes. The isoprenoid genes consisted of three types as MEP pathway (19 
genes), MVA pathway (15 genes) and mitochondrion (5 genes). See [TB] for the details. 
Prom Eigure 5 we expected that (i) the correlation between DPPS2 and the additional 
genes is high, and (ii) the correlation between the genes of mitochondrion (except DPPS2) 
and the additional genes is low. We set X 2 j as the additional genes {p 2 = 795). We 
considered three tests for xij-. (a) the genes of mitochondrion {p 2 = 5); (b) DPPS2 
{p 2 = 1); and (c) UPPSl, GGPPS1,5,9 {p 2 = 4). By using the first 50 samples (re = 50) 
of the 118 samples, we constructed ([1]). Then, with a = 0.05, we rejected Hq for (a) since 
Tn/b = 12.27 and for (b) since T^/b = 13.23. On the other hand, we accepted Hq for (c) 
since Tn/b = 1.417. 

Similar to Section 5 in Yata and Aoshima [19] , we considered a high-dimensional linear 
regression model: 

Y = X& + E, 
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where Y is an n x j ?2 response matrix, X is an n x A: fixed design matrix, and 0 is a 
k X p 2 parameter matrix. The n rows of E are independent and identically distributed 
as a p 2 -variate distribution with mean vector zero. Let xij be the jth sample of the 35 
isoprenoid genes (except UPPSl, GGPPS1,5,9). Let aJi(j) = (1,®^)^, j = 1,...,118. We 
set = [* 21 ,..., * 271 ]"^ and X = ..., with A: = 36. We noted that the standard 

elements of 0 are path coefficients from the isoprenoid genes to the additional genes. By 
using the observed samples of size n = 50 as a training data set, we obtained the least 
squared estimator of 0 by 0 = {X'^X)~^X'^Y. We investigated prediction accuracy of 
the regression with 0 by using the remaining samples of size 68 (= 118 — 50) as a test data 

set. We considered the prediction mean squared error (PMSE) by E{\\x 2 j — & ®i(j)|p|0). 
By using the test samples ®i(j) and X 2 j, j = 51,..., 118, we applied the bias-corrected and 
accelerated (BCa) bootstrap by Efron [ 6 ]. Then, we constructed 95% conhdence interval 
(Cl) of the PMSE by [837.6,1189.5] from 10000 replications. On the other hand, we 
considered the PMSE for the full isoprenoid (39 genes). Then, similar to above, we 
constructed 95% Cl of the PMSE by [1088.7,1581.3]. The PMSE by the 35 isoprenoid 
genes is probably smaller than that of the full isoprenoid genes. Thus we conclude that 
the test procedure by ([4]) effectively works for this data set. 

A Proofs 

Throughout, we assume that = 0 and /X 2 = 0 without loss of generality. Let T = 
tr(Si5]*S2^D) = tr(5]^)tr(5]|) and Q = tr( 5 ]^)tr( 5 ] 2 ). Note that 

< E(^S5^*72i)' = T; 

i=l i,j 

<? q 

tr{(S,S^)2} = < ^(7u^*72jf = T; 

^ = '^i7li7ij72i72j) < { Yl{7u7ijf} ' { X](7^i72j)^} ' = 
ij i,j i,j 

= '^{7ii'^i7ij){72i'^272j) < { '^i7u'^i7ijf} ‘ { '^{7li'^272j?] ' 
ij i,j i,j 

= and 

'^{7u'^l7li){72i^272i) < ' {^h2i^272if] ' 

2=1 2=1 2=1 

< 0^/2 < 4 ^ ( 6 ) 

from the fact that tr(Sf) < tr(5]?)^ for i = 1, 2. Then, we note that = 0(T/n^-|-T/n), 
where K is given in Remark 5. Let yij = Un^ij—^ and Sij = x'^^xijx'^-X 2 j — /S. for all i < j. 
Notethatf„-A = 2YJl^-yij/{n{n-l)}. Let = Y^l^,Y.t=i7ir7it72s72tWriWsi{wjj- 
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1 )> V’ii = Er,t7Tr7lt7i’r72t(w^ri-l)(^tj-l) auduJij = Yll^sll^ullrlltl2sl2uWriWsiWtjWuj 
for all i / j. Note that E{ujij) = 0 for all i ^ j and E{uJijUJiij) = 0 for all i ^ i' ^ j. 
Let Un = - 1)}> “ 1)1 ^nd B = E{ujfj) {i 7 ^ j). 

Let 51^ — ®i(i)(i+'/))(® 2 i “ ® 2 (i)(i+j)) /(^(i) ~ 1) s-nd 51^ ,^^ 2 ) ^( 2 )(®ij “ 

®i( 2 )(i+i))(® 2 j - * 2 ( 2 )(i+i))’^/(f^( 2 ) - 1 ) for all i < j. 

Proof of Lemma 3.1. We write that 

Vij ^*)(^*,ij(2) ^*) } 

+ + fo(^*,ii(2)^r) “ 2A^; 

£ij =ujij + rjij + r]ji + i>ij + tr(iCii£c^jSj) + - 2A^ (7) 

for all i < j. Note that all the terms of Eij in ([7|) are uncorrelated under (A-i). From ([ 6 |), 
it holds that under (A-i) 

= o{J2(iljuili2tf) = o( j;7r.5]i7i.7i;5]272.) = 

r,t r=l 

for all i / j. Similarly, under (A-i), it holds that E{'qf-) = 0(0^/^) for all i 7 ^ j. Then, 
we have that under (A-i) 

E{e%) = ^ + A2 + C>(T + 0 ^ 2 ) for all i < j; 

E{£ij£ik) = E{£ik£jk) = Var{tr(a;iia;JjZ:^)} 

<? 

= T + tr{(I],5]J)2} + ^(M, - 2)(7f,5],72,)2 for all i<j<k- 

r=l 

and E{sijeki) = 0 for all i < j and k <l\ i ^ j ^ k ^ 1. 

Then, we have that as m ^ 00 

Var([/„) = E{Ul) = K^l + o(l)} + 0{Ll^/^/r?) = 0{K^) ( 8 ) 

under (A-i). On the other hand, we have that as n ^ 00 

E{{yij - £ij f} = 0{d>/n) for all i < j; 

E{{yij - £ij){yik - £ik)} = 0{^/n^ + T/n} 

E{{yik - £ik){yjk - ^jk)} = 0{di/ri^+ T/n} for alH < j < A;; and 

E{{yij - eij){yki - £ki)} = 0{^/n^ + T/n^} 

for all i < j and k <1; i ^ j ^ k ^ I 

under (A-i). Then, we have that as m —>■ 00 

Var(C/„ - fn) = E[{Un - (f„ - A)}^] = o{K^) (9) 
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under (A-i). Hence, by combining ([8]) with ([9]), we have that as m —)■ cx) 

Var(T;) = E[{{Tn - A) - Un + Un}^] 

= Var(H„) + Var(C/„ - f,) - 2E[{Un - (f„ - A)}C/„] 

= K^l + o{l)}+ 0{n^/^/n^) 

under (A-i) from the fact that \E[{Un — {Tn — A)}C/„]| < {Var(f7„ — T„)Var([/n)}^/^ by 
Schwarz’s inequality. It concludes the result. □ 

Proof of Lemma 3.2. Let r* = rank(5]^' S*). When we consider the singular value de¬ 
composition of it follows that where A*i > • • • > 

1 /2 

X*r,{> 0) denote singular values of Eif S*, and (or ^*^( 2 )) denotes a unit left- (or 

right-) singular vector corresponding to (j = 1, ...,r*). Then, it holds that 

r* r* 

i=i i=i 

r* r* 

= '^*j^*j(2)^2^*j(2) A Amax(^2) ^ ^ A^j = Amax(^2)tr(^* 5]iX)*). 

i=i i=i 

Similarly, we claim that tr(s3"5]iS*) < A Tnax (^i )tr(S^X)^.) = A max (Si )A, so that 

Tf' ^ Amax(^l)Amax(^2 )A. (10) 

Thus under (A-iii), it holds that T = o(A'I'^/^) as p —)• oo. Then, we claim that reT/T = 
o(nA/T^/^) as p —)> oo, so that under (A-iii) and (A-v) 

nT/'I' = o(l) as m —> oo. (11) 

By noting that X]?=i(7ii^*72j)^ A 7 and tr{(5]*s3")^} < T, from Lemma 3.1 and (fTTI) . 
we have that as m —>■ oo 

Var(f„)/(l2 = 1 + o(l) 

under (A-i), (A-iii) and (A-v) from the fact that A^/'I' = o(l) as p —>■ oo under (A-v). □ 

Proof of Theorem 3.1. From (fTO]) . it holds that T < T^/^A, so that = 0(T/n^ -|- 
T^/^A/n). Then, from Lemma 3.1 and < ^, it holds that as m —)• oo 

Var(r„/A) = 0{T/(n2A2) + ^!^/‘^/{nA)} 

under (A-i). Thus, under (A-iv), from Chebyshev’s inequality, we can claim the result. □ 
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We give the following lemmas to prove Theorem 3.2. 


Lemma A.l. It holds that 

for all i,i' ^ j; and 

jOJijfUJifj/^ — 0(f2) for all i i ^ j ^ j 

under (A-ii). 


Proof. We first consider the first result of Lemma A.l. Let Crstu = l\r'I\tl 2 s'l 2 u s-ll 
r^SjtjU. Let Ai — Crstu (Crstu T Csrtu T Crsut T Csrut^UJri^si^tjWuj A 2 — 

iofj — Ai. Note that E{Ai) = B and E{A 2 ) = 0 under (A-ii). Here, we claim that 
Er^s E?^u(Crstu + Csrtu + Crsut + Csrut) = C»(^'), SO that 
q q 

EE(K''"“i + iCsriul H“ I Crsut I “1“ | Csrut |)2 = 0(T). 

r^s t^u 

Then, under (A-ii), we have that 

q q 2 

-^(^l) — -^1 f y ^ (ICrstuI + ICsrtuI + ICrsutI + ICsrutI) ^^.jWf-jW | 

r^s t^u 

= 0(^2). (12) 

For E{A 2 ), it is necessary to consider the terms of Wri'>^r'i'^r''i {r ^ r' ^ r") because it 
does not hold that E{w^-w^,-w'^„j) = 0 (r 7 ^ r' 7 ^ r”) unless E{w^f) = 0 or E{w^,f) = 0. 
Here, under (A-ii), we evaluate that for sufficiently large C > 0 


E ( E E Crr'tuCrr^'tu ^ ^ Crr 

t^u t'^u' 

<? <? <? 

< c E E ICrr^tuCrr^^tu | E ICrr^t^u^CrV^^t^u^ | 

t^u t'^u' 


^2 { ( ^ Crr'tu) ( ^ Crr"tu) } { ( ^ Crr'tu) ( ^ CX"tu) } 


1/2 


<c\ 


r,r',r'' t,u 

r 

CE(E 

r,r',r^^ t,u 

q q 

r,r' t,u 


t,u 

*? 


ECrVtu)}'^'{ E (EC'-'tu)(ECVtu)} 

t,u r,r',r" t,u t,u 


t,u t,u 


1/2 


t,u 
*? <i 

EEC 

r‘,r'' t,u 


2 

rr"tu 


1/2 


q q 

EEC 

v' ff" 


2 

r'r''tu 


1/2 


= 0(T^ 


from the fact that \E{w^j^)\ < {E{wfj^)E{w‘^j^)y^‘^ < for all r. Similarly, for other 

terms, we can evaluate the order as O(T^). Hence, we can claim that E{A^) = ©(T^) 
under (A-ii), so that 


H(4) = 0{E{Al) + E{Al)] = 0(t2) 
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from ([HD. By noting that E{u]‘f^u:l-) < we can conclude the first 

result of Lemma A.l. 

Next, we consider the second result of Lemma A.l. From ([GD, under (A-ii), we can 
evaluate that 

E{ujijUi/jUij'UJi/j/) = 0{Q) + O(T^) = 0 (n) for all i ^ i' ^ j ^ j'■ 

It concludes the second result of Lemma A.l. The proof is completed. □ 

Lemma A. 2. It holds that as m ^ oo 

Var{fn - Vn) = o{5^) 


under (A-i), (A-iii) and (A-v). 

Proof. Prom (|7D, we have that under (A-i) 

E{{ujij - Sijf} = 0(T + for all i / j; 

E{{ujij - Sij){ujik - eik)} = 0(T) for all i ^ j ^ t, 
and E{{u;ij - eij){u!ki - Ski)} = 0 for alH / j / A: 7 ^ 1. 

Then, from (lllh . we have that as m —)• 00 

Var{Un - Vn) = 0(T/n + = o(j2) (13) 


under (A-i), (A-iii) and (A-v). By combining (I13p with ([9D, from the fact that Var(T„ — 
Vn) = 0{Var(T„ — [/„) -|- Var(17„ — Vn)}, we can conclude the result. □ 

Proof of Theorem 3.2. Let Vj = 2{n(n — 1)}”^ Hi=i d — 2,...,n. Note that 

Hj=2 = 2 YA<j ^ij /“ 1)} = 


Var 


1=2 


%<j 


2B 


Ejcofj) _ 

n^(n — 1)^ n{n — 1) 


Here, we have for j = 3, ...,n, that E{vj\vj-i, ..., 02 ) = 0. Then, we consider applying the 
martingale central limit theorem given by McLeish [10]. Let = Vj/[2B/{n{n — 1)}]^/^, 
j = 2,...,n. Note that J2J=2 E(^j) = 1 and Var(Hj= 2 '^l) ~ denote the 

indicator function. From ([ 6 D, under (A-ii) and (A-iii), it holds that as p ^ 00 

H = T + + OinE^) = T{1 + 0 ( 1 )} + A^. (14) 


Then, by using Chebyshev’s inequality and Schwarz’s inequality, from Lemma A.l, under 
(A-ii) and (A-iii), it holds for Lindeberg’s condition that as m —?> 00 


n n 

E®Kh«J>U)<E-7d 



^ 0 


(15) 
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for any r > 0. Here, from Lemma A.l, (I14p and (jlSp . under (A-ii) and (A-iii), we evaluate 
that as m ^ oo 


^ E[{(] - Eie)}^ < ^ Ei^j) ^ 0; and 
i=2 j=2 

Y, m! - mime] - e{c])}] = o 

2<i<j<n 


^2 


B‘^n B'^J 


n \ 

+ 


so that 


Var 


n n 


2i 


(16) 


i=2 


i=2 


Then, by using the martingale central limit theorem, from (1151) and (1161) . under (A-ii) and 
(A-iii), we obtain that as m —>■ oo 




,_= = V A- ^ A^(0,1). 

v'2H/{n(n - 1)} ^ 


(17) 


Note that 6/[2B/{n{n — 1)}]^^^ —5^ 1 as m —>■ oo under (A-ii), (A-iii) and (A-v). Then, by 
combining (11711 with Lemmas 3.2 and A.2, we have that as m —>■ oo 


Tn- A 


^+op(l) = 




x/Var(T„) <5 ' v^2H/{n(n - 1)} 

under (A-ii), (A-iii) and (A-v). It concludes the result. 


+ op(l) => iV(0,1) 


(18) 

□ 


Proof of Lemma 3.3. By Lemma 3.1 after replacing (S2) 72p with (Hi, 'jij, Si, tr(Hf)), 

we can conclude the result when i = \. When i = 2, we have the result similarly. Thus 
the proof is completed. □ 

Proof of Corollary 3.1. By combining Theorem 3.2 with Lemma 3.3, we can conclude the 
result. □ 

Proofs of Theorem 4-1 o,nd Corollary 4-1. We first consider the proof of Corollary 4.1. 
From Theorem 3.1, under (A-i) and (A-iv), we have that as m —>■ oo 




P(-^ >Z„) =p[^>Za-) =P(l + Op(l) >Op(l)) ^1 


from the fact that 6/A = op(l) as m —>■ oo under (A-i) and (A-iv). It concludes the result 
of Corollary 4.1. 

Next, we consider the proof of Theorem 4.1. From Corollary 3.1, under (A-ii), (A-iii) 
and (A-v), we have that as m —)■ oo 


p(^^ > ^ ^ 0 ^( 1 ) y Za- - Za'j + o(l) 


A' 


'A 
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from the fact that 5 / 5 = 1 + op{l) asm—7>oo under (A-ii). We can conclude the results 
of size and power when (A-v) is met in Theorem 4.1. We note that — Za) —>■ 1 

as m —>■ oo under (A-iv), so that we obtain the result of power when (A-iv) is met from 
Corollary 4.1. Hence, by considering the convergent subsequence of A/(^, we can conclude 
the result of power in Theorem 4.1. The proofs are completed. □ 

Proof of Proposition 5.1. We first consider the case when (A-iv) is met. From Theorem 
3.1 and Lemma 3.3, under (A-i) and (A-iv), it holds that as m —)■ oo 

It concludes the result when (A-iv) is met. 

Next, we consider the case when (A-v) is met. From (fT0|) . it holds that T < A ma^ (Si )A ma v(S?)A < 
\1/1/2 a, so that nT/T = 0(1) as m —>■ oo under (A-v). Then, from Lemma 3.1 and 
(l6|), under (A-i) and (A-v), we claim that Var(T)j) = 0(T/n^) as m —>■ oo. Note that 
A = 0('I'^/^/n) as m —)■ oo under (A-v). Thus under (A-i) and (A-v), it holds that 
fn = A + Op(^^/Vn) = Op(tV 2/„) as m —)• oo. Then, from Lemma 3.3, under (A-i) 
and (A-v), we have that as m —>■ oo 

S-'=^{l + op(l)} = Op(l), 

It concludes the result when (A-v) is met. The proof is completed. □ 

Proof of Proposition 5.2. By combining Lemmas 3.1 and 3.3, we can conclude the result. 

□ 

Proof of Lemma 5.1. Let = Ay,o+||So||p-Ao andejj,o = eij-x\-T.QX 2 i-x\-'S,QX 2 j + 

2tr(S*S^) for all i < j. From ([7]), we write that 

Uijfl ^*) } 

+ tr(S^, + tr(S*_jj(2)5]3o) — 2tr(X)*5]3o)i 

£ij,o =<^ij + Vij + Vji + + tr(®iia;^iSJo) tr(®y®^jS:Jo) - 2tr(I]*s3o) 

for all i < j. Then, in a way similar to the proof of Lemma 3.1, we can conclude the 
result. □ 

Proof of Corollary 5.1. Let Tq = tr(Si5]*o5]2S^o)- Similarly to (fTOjl . it holds that 

< Amax(Si)Amax(S2)Ao < T^/^Aq. 

Then, by noting that X]i=i(7n^*o72i)^ ^ and tr{(5]*o5I^)^} < Tq, from Lemma 5.1, 
we have that as m —oo 

Var(r„,o/Ao) = 0{^/{n^Al) + ^^/^inAo)} 

under (A-i). Thus, under (A-iv) with A = Aq, from Chebyshev’s inequality, we can claim 
the result of Corollary 5.1. □ 
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Proof of Corollary 5.2. Similarly to the proof of Lemma A.2, under (A-i), (A-iii), (A-v) 
and (A-v) with A = Aq, we can claim that Var(T„^o ~ Ki) = o{5‘^) as m ^ oo. Thus, 
similar to (1181) . from Lemma 3.3, we can conclude the result. □ 
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